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ABSTRACT. - Biodiversity studies produce a large amount of data. Few are well-documented, compiled, readily avail¬ 
able and easily accessible. The development of a specific database for the Kerguelen Plateau is an important step to provide 
data and analyses for supporting research and fisheries management. SIMPA is a database which covers all data from 
MNHN scientific expeditions, collections and fisheries from the French Sub-Antarctic Territories. The use of reference 
tables and standardized computer language procedure open the data to exchange links with other databases like the Global 
Biodiversity Information System Facility (GBIF) and FishBase. The data already integrated into SIMPA provide informa¬ 
tion on research expeditions from 1874 to present and on fisheries from 1979 to present. The next step is to encode data on 
the benthic marine invertebrates of the Kerguelen Plateau. This will improve the features of SIMPA, making it a desirable 
tool for fisheries management and ecosystem modelling in the region. 


RESUME. - SIMPA - Un outil pour la gestion des pecheries et la modelisation de l’ecosysteme. 

Les etudes sur la biodiversite generent un important jeu de donnees. Cependant tres peu de ces donnees sont compi- 
lees, accessibles et faciles a obtenir. Le developpement d’un systeme d’information specifique au plateau de Kerguelen est 
une etape importante pour foumir des donnees et des analyses permettant d’aider la recherche et la gestion des pecheries. 
SIMPA est un systeme d’information qui inclut toutes les donnees du Museum national d’Flistoire naturelle sur les expe¬ 
ditions scientifiques, les collections, et les pecheries des Territoires frangais sub-antarctiques. L’utilisation de referentiel et 
de langage informatique standardise ouvrent les donnees a des possibilites d’echanges et de liens avec d’autres bases de 
donnees telles que le Global Biodiversity Information Facility (GBIF) et FishBase. Les donnees deja integrees dans SIMPA 
fournissent des informations sur les expeditions de recherche depuis 1874 a maintenant et sur les pecheries de 1979 a 
aujourd’hui. La prochaine etape du developpement de ce systeme d’information est l’integration des donnees sur les inver- 
tebres marins benthique du plateau de Kerguelen. Cette etape permettra d’augmenter les fonctionnalites de SIMPA en en 
faisant un outil interessant pour la gestion des pecheries et pour la modelisation ecosystemique de la region. 
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An important legacy of biodiversity studies is the large 
amount of scientific publications produced and diverse 
specimens collected. For example, prominent institutions 
such as the National Museum of Natural History (MNHN) in 
Paris have gathered publications and millions of specimens 
from throughout the world from over 200 years of scientific 
research. 

However, there are relatively few initiatives to preserve 
and make accessible the data generated from these stud¬ 
ies. As a result, fewer institutional resources are devoted to 
retaining and managing data over the long term, and datasets 
may be kept in a very heterogeneous way and stored in dif¬ 
ferent formats, making them less accessible. 

The purpose of the project Systeme d’lnformation des 
Milieux et Peuplements Aquatiques (SIMPA) is to develop a 
database which is able to centralize and store all primary data¬ 
sets available from French Territories on the Subantarctic. 

This system will provide access to standardized data 
across a range of secure platforms (analysis programs and 


operating systems) by a large user community. Reference 
and standardized datasets can be used towards improved 
fisheries management and ecosystem modeling. 

The SIMPA project is complemented by other national 
and international initiatives. Several projects focus on the 
inventory and storage of datasets for a better understand¬ 
ing of biodiversity and the impact of global environmental 
change. These national initiatives, supported by the National 
Biodiversity Observatory Network, include: inventory of 
existing datasets, databases and data structures, under The 
Systeme dTnformation Nature et Paysage (SINP), which 
is a French initiative to collect and digitize such informa¬ 
tion (http://www.naturefrance.fr/); and the Observatory and 
Experimentation Systems for Environmental Research (Sys- 
temes d’Observation et d’Experimentation, sur le long terme, 
pour la Recherche en Environnement, SOERE), which are 
programs based on the collection of temporal biodiversity 
data and experimentation supported by the Alliance pour la 
recherche en environnement (http://www.allenvi.fr). 
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The international initiatives develop a number of inter¬ 
national databases which makes biodiversity information 
readily available, primarily through the internet. The glo¬ 
bal biodiversity databases such as the Catalogue of Life 
(http://www.catalogueoflife.org/). World Register of Marine 
Species (WoRMS (http://www.marinespecies.org); Appel- 
tans et al., 2010) and the European Register of Marine Spe¬ 
cies (ERMS (http://www.marbef.org/data/erms.php); Cos¬ 
tello et al., 2008) or national lists like Ref-Tax for the French 
territory. There are also information systems like FishBase 
(www.fishbase.org; Froese and Pauly, 2011), SealifeBase 
(www.sealifebase.org; Palomares and Pauly, 2011), and 
AlgaeBase (http://www.algaebase.org/; Guiry and Guiry, 
2011). Species inventories also exists globally, through the 
Global Biodiversity Information Facility (GBIF); region¬ 
ally, through the Scientific Committee on Antarctic Research 
Marine Biodiversity Information Network (ScarMarBin; 
http://www.scarmarbin.be/) and the Ocean Biogeograph¬ 
ic Information System (OBIS; http://www.iobis.org/); or 
national/local, such as in France for example, through the 
Inventaire National du Patrimoine Naturel (INPN) or the¬ 
matic as in l’Atlas des pecheries fran£aises. 

MATERIALS AND METHODS 

The first step in organizing the SIMPA was to establish 
a computer centre to provide a long-term and secure infra¬ 
structure. The next step was to develop the structure of the 
database by analysing all datasets and developing different 
databases following the thematic structure of the data. Ref¬ 
erence databases for taxonomic data and for oceanographic 
cruises were also developed. The third step was to define and 
use a format based on international standards that enables 


information exchanges. Finally, a secure portal will be cre¬ 
ated to provide access to information (Fig. 1). 

Development of SIMPA 

Database concept 

The dataset included in SIMPA concerns four different 
themes. We decided to develop and connect three different 
databases using the same language and standard: GICIM 
(Gestion Informatisee des Collections d’lchtyologie du 
Museum) for the MNHN ichthyology collections data; 
Pecheker (Pecheries de Kerguelen) for the fisheries data; 
and BasExp (Base Expeditions) for the descriptions of sci¬ 
entific expeditions or programs and for the biological data 
from scientific cruises. 

Reference tables 

Reference tables contain common data shared by data¬ 
bases (e.g., species names, country names) which are defined 
according to standard procedures, i.e., scientific names fol¬ 
low the zoological convention on taxonomic names and 
country names follow the International Organization for 
Standardization (ISO). 

Taxonomic reference table 

The most important element in a biodiversity database is 
its taxonomic backbone. Taxonomic databases like the Cata¬ 
logue of Life and WoRMS, integrate information systems 
like FishBase or AlgaeBase which keep track of the evolu¬ 
tion of taxonomic names. SIMPA will use WoRMS as its 
taxonomic reference. 

Scientific cruises reference table 

This table provides data on all collection and observation 
surveys. This table should give the opportunity to easily link 
datasets from different databases from the same cruise. 


Steps to create an Information System 



Geographic names reference table 

In SIMPA, a database of standard geographic names is 
used to provide systems interoperability based on location, 
habitats, hydrographic features or oceanographic basins. 

Scientists reference table 

SIMPA will provide data about the scientists who partici¬ 
pated in the cruise or those who studied the collected sam¬ 
ples. The scientists name table will be linked to the FishBase 
and SealifeBase projects. 



Emrs 




Worms 


Figure 1. - Steps to create an Aquatic biodiversity databases “Sys- 
teme dTnformation des Milieux et Peuplements Aquatiques”. 


Data input 

SIMPA is supported by an Oracle® based software, 
maintained by the information technology (IT) staff of 
the MNHN, who also handles security and archiving. The 
transfer of data is done through the MNHN software called 
Jacim®. To input data and optimize it, several steps are 
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3. Analyse, Standardise, Validate 

4. Input in a database 



Figure 2. - Data input process to the database “SIMPA”. 


required which is shown in figure 2. It is necessary to do the 
inventory of the dataset, to fix the data priority and to ana¬ 
lyze and standardize heterogeneous data. 


Inventory of the dataset 

This step concerns the inventory of subject-related data 
available. The dataset must be described and characterized. 
The description of the dataset must detail the availability of 
the data, the level of digitizations (e.g., are the data already 
in a computer file or still on a paper sheet). This data inven¬ 
tory must also give information on the theme, the study area 
and time when the dataset was collected. This description 
must give an idea of the work that has occurred on with the 
data and that still needs to be done. Are the data in a homoge¬ 
neous format? Are they standardized? Are they exhaustive? 
Free of copyright? All data must be validated by a scientific 
expert. 

This inventory is then provided to the SINP and the 
GBIF. The inventory must be considered as a picture of the 
state of the data availability at a particular time, which may 
help define the priority and the resources necessary to fully 
include the data in SIMPA. 


Data priority 

During this step, it is necessary to choose the different 
datasets to be included in the database. It depends on fac¬ 
tors such as dataset availability, complexity and relevance, 
as well as the necessary resources to obtain and include it in 
the database. 

Standardization and data input 

Any heterogeneous dataset must be standardized before 
it is entered in the database and eventually shared with other 
databases. Standardization refers to procedures applied to 
data extracted from their source. The last step is the encod¬ 
ing of the result of this process in a data field. 


RESULTS 

The work done on the data from the Kerguelen Plateau 
offers the opportunity to update the collection database 
GICIM, to develop the fisheries database Pecheker and to 
integrate all ichthyological records from scientific cruises in 
the new BasExp database. All specimens from the MNHN 
ichthyology collection are in SIMPA and are already avail¬ 
able on the GBIF and Scarmarbin portal. Data from over 
92000 stations have been encoded. Stations mainly cover 
fisheries’ activities. Each station data include date, time, start 
and end coordinates, station depth and the depth of the gear. 

Currently, only 10% comes from scientific expeditions, 
and a total of 12000 specimens from the Kerguelen Plateau 
provide information on 120 taxa. For many specimens, avail¬ 
able data include length, weight and sexual maturity. 

Gestion des Collections d’lchtyologie du Museum 
(GICIM) Database 

GICIM was created in 1982 to house all data from the 
MNHN ichthyology collection (logged specimens). A few 
uninventoried specimens are not yet in GICIM. 

GICIM is not only about specimens’ data such as size, 
sex or locality but it offers also additional data like images 
or information on requests and loans. GICIM can integrate 
all data required for the collection management. We distin¬ 
guish several modules (Fig. 3): description data [this module 
includes all the tables, both physical (size, weight, status) 
and descriptive (capture sites and taxonomic names)]; data 
management and accessibility event (this module allows 
management of all applications and items related to a speci¬ 
men); conservation of specimens (places of storage, con¬ 
tainer type); images obtained from the specimen; and related 
collections [this module identifies and manage any type of 



Figure 3. - Conceptual scheme of the database “GICIM”. 1: 
Description data; 2: Data management; 3: Preservation status; 4: 
Images data; 5: Related collections. 
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Figure 4. - Location of MNHN collections specimens from the Ker¬ 
guelen Plateau on the Google maps website. 


subset of collections obtained from the reference specimens 
(such as a tissue sample or the otoliths)]. 

The GICIM database contains more than 500000 speci¬ 
mens representing over 13000 different taxa. 

Since 1874 in the Kerguelen Plateau, 2262 specimens 
representing 120 different taxa from 1 740 stations have 
been placed in collections. The distribution of these speci¬ 
mens covers a very large part of the Kerguelen Plateau. Data 
are available from the website of the MNHN (www.mnhn. 
fr/collections/gicim) and can be viewed on geographic map¬ 
ping website like Google maps® (Fig. 4) or Bing®. 

The collection of new fish specimens has been closely 
linked to scientific expeditions of the French Polar Institute 
(Institut Paul Emile Victor, IPEV). The increased collection 


continues mainly through the work of fishing observers with 
the support of shipowners who regularly ship new specimens 
caught as bycatch. 

The first five specimens in the collection from the Ker¬ 
guelen Islands, representing three taxa, were captured in 
1874 by the Challenger s expedition in Crozet and Kergue¬ 
len: MNHN 1890-0100: Channichthys rhinoceratus Rich¬ 
ardson, 1844; MNHN 1890-0101 to 1890-0103: Lepido- 
notothen mizops (Gunther, 1880); MNHN 1890-0104: Noto- 
thenia cyanobrancha Richardson, 1844. 

A significant improvement of the collections appeared 
only 100 years later when a permanent French scientific sta¬ 
tion was established in Port-aux-Francais on the main island 
through the Delepine, Reneau and Hureau expeditions from 
1963 to 1966. A total of 358 lots were added to the collec¬ 
tion and 40 species were recorded. In 1974, the MD03 cruise 
(Hureau, 1976), on board the Marion-Dufresne , greatly 
enriched the MNHN ichthyology collection and provided for 
the first time specimens from all over the shelf. This collect¬ 
ing effort was carried on to the 1990s by several scientific 
surveys using beam trawl ( MD04 in 1975: Guille, 1977) or 
bottom trawls (Skif and Kalper cruises Skalp 1987 and 1988: 
Duhamel, 1993), and from 1995 to 1999 with the Icoker 
(Ichtyologie cohere de Kerguelen), Ipeker (Ichtyologie pela- 
gique de Kerguelen), and Ichtyoker (Ichtyologie de Kergue¬ 
len) programs on La Curieuse using both the International 
Young Gadoid Pelagic Trawl (IYGPT) and bottom trawl. 
Since then the sampling has taken place from the catches of 
fishing vessels (Fig. 5). 

The sampling effort has been strongly correlated to the 
availability of scientific vessels in the second part of 20 lh 
century. Fish samples clearly show that collection has been 
confined in the coastal area of the Gulf of Morbihan using 


Figure 5. - Annual evolution of the 
MNHN fish collection from 1874 
to today. Lot number (left axis) and 
number of taxa (right axis). 
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Figure 6 - Distribution maps of collection location 
of MNHN specimens off the Kerguelen Islands. A: 
before 1970; B: from 1970 to 1980; C: from 1980 
to 1990; D: from 1990 to 2000; E: since 2000. 
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Figure 7. - Conceptual scheme of the Pecheker database. 


the small vessel La Japonaise before 1970. The survey area 
has expanded since with the cruises of Marion-Dufresne in 
the 1970s and the Curieuse in the 1990s (Fig. 6). 

Database for the management of fishing resources of 
Terres Australes et Antarctiques Fran Raises (TAAF): 
Pecheker 

In 1982, the first database called Kerpeche was created, 
following the development of trawl fisheries around Ker¬ 
guelen. In 2008, this database was modified to integrate 
all the fisheries including trawling, longline and trap. This 
new database, named Pecheker, was financed by the French 
Fisheries Ministry, cooperating on a project, the Atlas of the 
French Fisheries, which aims to share fisheries data through 
a French web portal. This project is managed by the Insti- 
tut fran£ais pour TExploitation de la mer (IFREMER), the 
French Institut de Recherche pour le Developpement (IRD), 
the Agrocampus-ouest in Rennes (www.agrocampus-ouest. 
fr/halieutique) and the MNHN. 

Pecheker includes all ichthyological data from the French 
Sub-Antarctic Territories from 1979 to present, and is used 
by the MNHN for fisheries management. 

Other information may also be found in Pecheker such 
as ship description (size, shipowners, crews), data on fishing 
effort (date, position, depth), data on catches for commercial 
and by catch fish species, biological data for commercial fish 
(size, weight), and birds and marine mammals observations. 

The work carried out under the Atlas of the French 
Fisheries project has not only developed the basic data of 
Pecheker but also integrate all data from TAAF’s fisheries 
in a single system. Pecheker gathers all the spatial data, both 
capture and biometrics, since the establishment of the Exclu¬ 
sive Economic ZOne (EEZ) in 1979 to the present day for 
each fishing station. In total, more than 92 000 data stations 
and one million biometric data records are stored in MNHN 
servers. 


Database for the data management of scientific 
expeditions (BasExp) 

Currently under development, this database will be inte¬ 
grated into SIMPA. It aims to describe the various surveys 
that took place on the Kerguelen Plateau, identify involved 
people, and the publications that were produced from a par¬ 
ticular research survey. 

The first project, funded by the MNHN Departement 
des milieux et peuplements aquatiques (DMPA), identified 
thirty cruises on the Kerguelen Plateau from 1929 to 2006, 
which includes pure scientific expeditions and cruises for 
evaluation and exploration of the fishery resources. More 
than 400 publications dealing with marine specimens of the 
Sub-Antarctic zones in TAAF were identified and more than 
300 people were involved in any part of the acquisition and 
dissemination of knowledge. This pioneering work on the 
data inventory allows further research of primary data from 
the content of these publications. By contacting the various 
people who have been involved in research on the Kerguelen 
Plateau, it is still possible to identify additional data, publi¬ 
cations or reports not previously recorded. 

The database also covered data from the Antarctic in 
2009 with funding from the French National Research 
Agency (ANR) through the program Antarctic Shelf as a 
species flock generator (Antflock). This aimed at managing 
the data from the Ceamarc cruises on the Adelie’s Plateau in 
Eastern Antarctica by Umitaka-Maru and Aurora Australis 
vessels in 2008. Moreover, the purpose of this database is 
to cover the full dataset provided by scientific expeditions 
and not only the data from the collection data. For example, 
the Ceamarc cruise shows that approximately only 70% of 
the sampled specimens have been preserved for collection 
(Denys, 2009), but the database BasExp allows analysis of 
100% of the specimens sampled. Several information were 
defined according to their nature: data describing the expedi¬ 
tions, stations data, and biological data. 

The database continued in 2010 with new funding from 
ANR through the program Global investigation of the dis¬ 
tribution of endangered Antarctic seabirds (Glides). All data 
from offshore research programs that occurred in eastern 
Kerguelen Plateau from 1995 to 2000 onboard La Curieuse 
(Ipeker, Ichtyoker and Kerguelen-Amsterdam (Kerams) 
cruises) have been registered. BasExp increased by more 
than 1 000 sampling stations data around Kerguelen, with 
analysis of over 20000 specimens. 

DISCUSSION 

Data incorporated into SIMPA have already been shared 
with numerous users, contributing to several published arti¬ 
cles and initiatives such as the field guide Guide regional des 
poissons de Kerguelen (Duhamel et al., 2005). These data 
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have also been used in the studies reported during the first 
symposium on the Kerguelen Plateau held in Concarneau in 
April 2010. 

Museum collections are mainly used for taxonomic and 
phylogenetic researches of specific taxa. The information 
from the collection database contributes to a global database 
like GBIF, providing the opportunity to perform taxonomic 
study on a very large scale like the review of the lantern- 
fish genus Bolinichthys (Hulley and Duhamel, 2010) or on 
family-level studies (Duhamel et al., 2010). The collection 
database can also be used to give an overview on the spe¬ 
cies distribution on specific taxa and gives historical time 
series of the presence or absence of specimen in a particu¬ 
lar area. It can also be used as an indicator of the history of 
a scientific research. Moreover, it completes the studies of 
past expeditions. On the Kerguelen plateau from 1874 to the 
present day, figures 5 and 6 show the relationship between 
the research vessels availability and the production of new 
data records. 

The ichthyological data from the Pecheker database are 
used frequently in several reports produced by the MNHN 
to the French Fisheries Ministry, CCAMLR bodies and other 
partners. It has also been used to model fishing effort (Lord 
et al., 2006). The access to this database also provides an 
opportunity to study the impact of the fisheries on the Ker¬ 
guelen plateau’s ecosystem (Delord et al., 2010). Data will 
be available soon through a web portal devoted to the French 
Fisheries Atlas. The web site is funded by the Direction des 
Peches maritimes et de Faquaculture (DPMA) (See 2010, 
Diffusion des donnees des observatoires halieutiques et de la 
reglementation des peches; Portail halieutique DPMA, atlas 
halieutique). 

The accessibility of data from SIMPA makes ecosystem 
modeling possible. The first model on the Kerguelen Pla¬ 
teau was achieved (Pruvost et al., 2005) using Ecopath with 
Ecosim (Christensen and Walters, 2004). Modelling of the 
lanternfish Electrona antarctica distribution off the Kergue¬ 
len plateau has also been produced with the data from the 
Ichtyoker cruise (Loots et al., 2007). Access to available 
data for the Kerguelen marine area through SIMPA, and for 
the Sub-Antarctic and Antarctic areas through ScarMarBin, 
provides the opportunity to develop large scale eco-regional- 
isations (Koubbi etal., 2011). 

CONCLUSION 

All available data have been incorporated to SIMPA. The 
database needs to be developed and updated continuously to 
accommodate additional information from other scientific 
cruises. Currently, this work is mainly focused on ichthyo¬ 
logical data. It is now necessary now to extend the dataset to 
other taxonomic groups. 


Moreover, it is also important to develop the database 
towards providing superior accessibility and visibility to 
users. To establish good relationships between the owners of 
the data and the database manager, it is necessary to adopt 
access rules and advice to protect the data. It is necessary 
to develop a web service that allows people to deposit their 
data and to use available information. 

SIMPA will link with other marine ecosystem databases 
through the use of taxonomic standards and universal com¬ 
puter language like Darwincore and abed. 

SIMPA has proven its usefulness in the improvement 
of knowledge and understanding of the Kerguelen area; its 
potential can be tapped in the future to assess areas in need 
of biodiversity protection. It will continue to improve as a 
tool for fisheries management and ecosystem modeling. 
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